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Abstract—Cloud controllers aim at responding to application 
demands by automatically scaling the compute resources at 
runtime to meet performance guarantees and minimize resource 
costs. Existing cloud controllers often resort to scaling strategies 
that are codified as a set of adaptation rules. However, for a cloud 
provider, applications running on top of the cloud infrastructure 
are more or less black-boxes, making it difficult at design time to 
define optimal or pre-emptive adaptation rules. Thus, the burden 
of taking adaptation decisions often is delegated to the cloud 
application. Yet, in most cases, application developers in turn 
have limited knowledge of the cloud infrastructure. 

In this paper, we propose learning adaptation rules during 
runtime. To this end, we introduce FQL4KE, a self-learning 
fuzzy cloud controller. In particular, FOL4KE learns and modifies 
fuzzy rules at runtime. The benefit is that for designing cloud 
controllers, we do not have to rely solely on precise design-time 
knowledge, which may be difficult to acquire. FOL4KE empowers 
users to specify cloud controllers by simply adjusting weights 
representing priorities in system goals instead of specifying 
complex adaptation rules. The applicability of FOL4KE has been 
experimentally assessed as part of the cloud application frame- 
work ElasticBench. The experimental results indicate that 
FQL4KE outperforms our previously developed fuzzy controller 
without learning mechanisms and the native Azure auto-scaling. 


I. INTRODUCTION 


The economic model of pay-per-use behind cloud comput- 
ing allows companies to rent a variety of resources for a 
certain period and access them via Internet [26]. Despite the 
advantages, dynamic acquisition and release of resources is a 
big challenge for applications due to the uncertainty introduced 
by workload, cost and user requirements. In order to address 
this challenge, many different approaches [33], [37], 
referred as auto-scaling has been proposed. The current state 
of practice relies on threshold-based rules and thanks to their 
simplicity and intuitiveness, they are mostly offered by many 
commercial cloud providers/platforms such as Amazon EC2 
(1, Microsoft Azure [2], OpenStack [3]. The typical practice 
is to define a manageable, usually small and comprehensible 
set of scaling rules, assuming a linear and constant dependency 
between resource assignments and performance improvements, 
while in Internet scale applications, the complexity of appli- 
cation architecture, the interferences among components and 
the frequency by which hardware and software failure arise 
typically invalidate these assumptions [38], [17]. 

The research community has investigated many alternative 
approaches. There has been solutions based on classical con- 


trol theory and on knowledge-based controllers and 
thus suffer from similar limitations [40]. Traditional capacity 
planning approaches based on queuing theory or similar 
model-based approaches [6] do not fully address the dynamics 
of cloud applications due to the mathematical simplifications 
and/or their static nature since the models are complex to be 
evolved at runtime, often resort to parameter tuning [28]. The 
recent trends based on self-organizing controllers have shown 
to be a better fit for the complexity of cloud controllers [17], 
[9]. However, a practical challenge remains unanswered, that is 
reliance on users for defining cloud controllers. There are some 
facts behind this challenge. First, from the cloud provider’s 
perspective, the details of the applications are simply not 
visible, making it difficult to accurately devise optimal set of 
scaling rules. Thus, the burden of determining such rules falls 
on the cloud users, who do not have enough knowledge about 
the workloads, infrastructure or performance modeling. 


A. Research Challenges 


In our previous work [27], we exploited fuzzy logic to 
facilitate user intuitive knowledge elicitation. The key strength 
of fuzzy logic is the ability to translate human knowledge 
into a set of intuitive rules. During the design process of a 
fuzzy controller, a set of IF-THEN rules must be defined. 
Although we showed that users are more comfortable with 
defining auto-scaling rules using fuzzy linguistic variables 
27], the rules have to be defined at design-time leading to the 
following issues: (i) Knowledge may not be available (user 
cannot prescribe any rule); (ii) Knowledge may be available 
but in part (user can only specify partial rules for some 
situations); (iii) Knowledge is not always optimal (user can 
specify the rules but they are not effective, e.g., redundant 
tules); (iv) Knowledge may be precise for some rules but 
may be less precise (i.e., contains uncertainty) for some 
other rules (depending on the degrees of a priori knowledge). 
(v) Knowledge may need to change at runtime (rules may be 
precise at design-time but may drift at runtime). As a result, 
user defined rules may lead to sub-optimal scaling decisions 
and loss of money for cloud application providers. 





B. Research Contributions 


In order to address the above challenge, we develop an 
online learning mechanism, FOL4KE, to adjust and improve 





auto-scaling policies at runtime. More specifically, we com- 
bine fuzzy control and Fuzzy Q-Learning (FQL) in 
order to connect human expertise to continuous evolution 
machinery. Q-learning is a method that allows the system 
to learn from interaction with the environment, where the 
learning is performed via a reward mechanism [47]. The com- 
bination of fuzzy control and the Fuzzy Q-Learning proposes 
a powerful self-adaptive mechanism where the fuzzy control 
facilitates the reasoning at a higher level of abstraction (i.e., 
human reasoning) and the Q-learning allows to adapt/adjust 
the controller. 
The main contributions of this work are as follows: 





(i) a self-learning fuzzy controller, FOL4KE, for dynamic 
resource allocations. 

(ii) a tool, ElasticBench, as a realization and a means 
for experimental evaluations of the entire approach. 





The main implication of this contribution is that we do not 
need to rely on the knowledge provided by the users anymore, 
FQL4KE can start adjusting application resources with no pri- 
ori knowledge. This means the auto-scaling controller can start 
working with an empty knowledge base and obtain knowledge 
at runtime, through the knowledge evolution mechanism. 
The rest of the paper is organized as follows. Section 
gives the underlying concepts and motivates the work. Section 
[II] describes the mechanisms that constitutes our solution fol- 
lowed with a realization in Section|[IV] Section[V]discusses the 
experimental results following by implications and limitations 


of this work in Finally, Section discusses the related 
VIII 





work and Section |VHI} concludes the paper. 


II. MOTIVATION AND JUSTIFICATION 
A. Motivation 


Dynamic resource provisioning, also called auto-scaling, is 
a decision making problem. Cloud controllers that realize auto- 
scaling plays the role of a controller that observes the resource 
consumption of application and manipulates the provisioning 
plans. In a more formal wording, computing nodes are allo- 
cated to cloud-based applications by regularly observing the 
workload, w, in user request per time unit and the current 
performance, rt, as average response time of the application. 
The cloud controller decides to allocate more nodes or release 
some existing ones in order to keep the performance rt below 
a desired performance rtgesireqd declared by the SLA of the 
application while minimizing costs. 

There are some common and noticeable characteristics that 
often challenge the existing auto-scaling techniques and tools: 
(i) the environment is non-episodic, i.e., current choice will 
affect future actions; (ii) cloud infrastructures are complex 
and difficult to model; (iii) workloads are irregular and dy- 
namic. These characteristics of the environment in which cloud 
controller operates as an agent require to solve sequential 
decision problems, where previous actions in specific states 
affect future ones. The common solution types for such kind 
of problem is to elaborate a plan, policy or strategy to act upon 
specific situation at hand. In this paper, we use the term policy 
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Fig. 1: RL agent interaction with environment. 
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as the knowledge inside cloud controllers that we aim to learn 
at runtime. As a results, policies determine the decisions that 
controllers produce based on different situations (i.e., the state 
in which the cloud application is in). 

In this setting, the choice of assigning the responsibility of 
allocating the required resources to the application provider 
depends on the nature of the applications that typically in- 
clude several components implementing the application logic 
through complex interactions. Hence, platform provider’s na- 
tive auto-scaling, like the ones monitoring only system metrics, 
is sub-optimal with respect to application-specific solutions 
such as autonomic controllers. The key reason for this is 
that the native auto-scaling only have limited knowledge as 
application architecture is not fully visible for providers. 

Although a variety of techniques have been proposed, 
developing efficient and effective auto-scaling approaches is 
a challenging activity[33]. While reactive techniques allocate 
resources according to the latest system demands [36] {11} (B41, 
proactive techniques forecast the future needs upon which 
they adjust the resource [19}[32)[42]. The majority of the 
approaches apply threshold-based rules to trigger adaptation 
actions, which is highly sensitive to the selected thresholds 
(35). The proactive approaches are also prone to severe predic- 
tion error as mostly use linear model to predict future situation 
(27), (33). In this work, we investigate the application of fuzzy 
controllers equipped with machine learning, here Q-Learning, 
to address the aforementioned challenges. 


B. Reinforcement Learning for Elasticity Decision Making 


As depicted in Figure (Ifa), an agent takes action a; when 
the system is in state s, and leaves the system to evolve to the 
next state s;,; and observes the reinforcement signal 7:41. 
The process of decision making in elastic systems can be 
represented as an interaction between cloud controllers and the 
environment. The cloud controller monitors the current state 
of the system through its sensors. Based on some knowledge, 
it chooses an action and evaluates feedback reward in the form 
of utility functions [46]. Situation allows the system to know 
when it must monitor the state and reward, and also when it 
must take the action corresponding to the state (i.e, triggers 
the scaling action). An elastic system may stay in the same 
state, but should take different actions in different situations, 
workload intensity. 

To derive an action, the agent uses a policy that aims to 
increase the future rewards at the long run. A model of the 
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Fig. 2: Model-based vs. model free RL. 
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environment, can help the decision making by agent (cf. Figure 
2a); however, it is not always feasible to have such a model 
available in many application areas. Model-free reinforcement 
learning (hereafter RL) techniques have been developed to 
address this issue, which are interesting for cloud computing 
problems due to the lack of environment models. In this paper, 
we use a model-free approach. More specifically, we use Q- 
Learning algorithm that computes the optimal policy with 
regard to both immediate and delayed rewards. In this case, a 
cloud controller learns a value function (cf. Figure [2{b)) that 
gives the consequent of applying different policies. 


C. Justification of RL for Dynamic Resource Provisioning 


There exist a couple of techniques that might be con- 
sidered as suitable candidates for the problem of dynamic 
resource allocation: Neural Networks (NNs), Evolutionary 
Algorithms, Swarm Intelligence and Reinforcement Learning. 
For the research we report in this paper, we had specific 
reasons for choosing RL amongst the alternative choices. NNs 
are appropriate for function approximation, classification and 
pattern recognition. A drawback of NNs is that they require a 
large diversity of pre-operation training before real operations, 
which can be a sever constraints as training trials in the context 
of cloud auto-scaling are costly and time consuming [17]. 

Evolutionary computation, such as genetic algorithm and 
particle swarm optimization, is based on randomly producing 
and comparing the evolution of many genes and particles, 
each of which represents a different configuration of a cloud 
controller. As a result, to evaluate the optimality of each 
particle, the evaluation of the corresponding controller should 
be carried out in many trials. The lack of generality in the 
definition of cloud controllers is also a constraint. Therefore, 
the optimization phase must be repeated. 

Finally, for the following reasons, RL is an appropriate fit: 

e Workload for cloud-based applications are unpredictable 

and obtaining an actual training data set that is represen- 
tative of all runtime situations becomes a mission impos- 
sible task. Unlike other supervised learning (e.g., NNs) 
approaches, in RL a training data set is not necessary. 

e Due to the unpredictably of workload and complexity 

of the cloud-based application, providers do not have a 
complete knowledge to take proper scaling actions. 


II. FUZZY Q-LEARNING FOR KNOWLEDGE EVOLUTION 





This section presents our solution FOL4KE. By combining 
fuzzy logic and Q-Learning, FOL4KE deals with uncertainty 
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Fig. 3: FQL4KE architecture. 






caused by the incomplete knowledge. Expert knowledge, if 
available, is encoded in terms of rules. The fuzzy rules are 
continually tuned through learning from the data collected at 
runtime. In case there is no (or limited) knowledge available 
at design-time, FOL4KE is still able to operate. 


A. FQLAKE Building Blocks 


Figure |3} illustrates the main building blocks of FOL4KE. 
While the application runs on a cloud platform that provides 
the demanded resource, FOL4KE monitors the application 
and guides resource provisioning. More precisely, FOL4KE 
follows the autonomic MAPE-K loop 29}, where different 
characteristics of the application (e.g. workload and response 
time) are continuously monitored, the satisfaction of system 
goals are checked and accordingly the resource allocation is 
adapted in case of deviation from goals. The goals (i.e., SLA, 
cost, response time) are reflected in the reward function as we 
will define this in Section 

The monitoring component collects low-level performance 
metrics and feed both cloud controller as well as the knowl- 
edge learning component. The actuator issues adaptation com- 
mands that it receives from the controller at each control 
interval to the underlying cloud platform. Two components 
of knowledge learning and cloud controller are incorporated 
for this purpose. The cloud controller is a fuzzy controller 
that takes the observed data, and generates scaling actions. 
The learning component continuously updates the knowledge 
base of the controller by learning appropriate rules. These 
two components are described in Sections and 
respectively. Finally, the integration of these two components 
is discussed in Section |I]-D 
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B. Fuzzy Logic Controller 


Fuzzy inference is the process of mapping a set of control 
inputs to a set of control outputs through fuzzy rules. The main 
application of fuzzy controllers is for types of problems that 
cannot be represented by explicit mathematical models due to 
high non-linearity of the system. Instead, the potential of fuzzy 


logic lies in its capability to approximate that non-linearity 
by expressing the knowledge in a similar way to the human 
perception and reasoning [27]. The inputs to the controller are 
the workload (w) and response time (rt) and the output is the 
scaling action (sa) in terms of increment (or decrement) in the 
number of virtual machines (VMs). 

The design of a fuzzy controller, in general, involves the 
following tasks: 1) defining the fuzzy sets and membership 
functions of the input signals. 2) defining the rule base which 
determines the behavior of the controller in terms of control 
actions using the linguistic variables defined in the previous 
task. The very first step in the design process is to partition 
the state space of each input variable into various fuzzy sets 
through membership functions. Each fuzzy set associated with 
a linguistic term such as “low” or “high”. The membership 
function, denoted by j1,,(a), quantifies the degree of member- 
ship of an input signal x to the fuzzy set y (cf. Figure (4). 
In this work, the membership functions, depicted in Figure 
[4]. are considered to be both triangular and trapezoidal. As 
shown, three fuzzy sets have been defined for each input 
(i.e., workload and response time) to achieve a reasonable 
granularity in the input space while keeping the number of 
states small to reduce the set of rules in the knowledge base. 
This number also corresponds to the number of states we have 
in Q-learning that we will describe later in the next section. 

The next step consists of defining the inference machinery 
for the controller by expressing the elasticity policies in 
linguistic terms as a set of rules. An example for such elasticity 
policy is: ”IF (w is high) AND (rt is bad) THEN (sa = +2)”, 
where the output function is a constant value, which can be 
an integer in {—2,—1,0,+1,+2}, which is associated to the 
change in the number of deployed nodes. Note that this set 
can be any finite set but here for simplicity we constraint it 
to only 5 possible actions, but depending on the problem at 
hand can be any finite discrete set of actions. For the definition 
of the functions in the rule consequents, the knowledge and 
experience of a human expert is generally used. In this work, 
no a priori knowledge for defining such rules is assumed. In 
particular, FOL4KE attempts to find the consequent Y for the 
rules, see Section |III-C 

Once the fuzzy controller is designed, the execution of the 
controller is comprised of three steps (cf. middle part of Figure 
3): (i) fuzzification of the inputs, (ii) fuzzy reasoning, and (iii) 
defuzzification of the output. Fuzzifier projects the crisp data 
onto fuzzy information using membership functions. Fuzzy 
engine reasons on information based on a set of fuzzy rules 
and derives fuzzy actions. Defuzzifier reverts the results back 
to crisp mode and activates an adaptation action. For the sake 
of simplicity, we calculate the output as a weighted average: 








N 
y(x) = So w(x) x ai, (1) 


i=l 


where NV is the number of rules, j1;(x) is the firing degree 
of the rule 2 for the input signal x and a, is the consequent 
function for the same rule. Then the output is rounded to the 
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Fig. 4: Fuzzy metrics for auto-scaling. 


nearest integer, due to the discrete nature of scaling actions 
for cloud scaling. Finally, this value, if endorsed by policy 
enforcer module (see Section IV). will be enacted by issuing 
appropriate commands to the underlying cloud platform fabric. 

Fuzzy controller, such as RobusT2Scale that we de- 
scribed in this section, have some limitations. The knowledge 
base defined to adjust resource to keep the performance of a 
cloud application at a desired level should be able to be applied 
to any potential scenario (e.g., different load levels). However, 
performance of the controller is not always consistent 
with the desires of the designer of the controller. The guilty 
here is the fixed fuzzy rules. In the next, we describe a 
mechanism to overcome this limitation. 


C. Fuzzy Q-Learning 


Until this stage, we have shown how to design a fuzzy 
controller for auto-scaling a cloud-based application where the 
elasticity policies are provided by users at design-time. In this 
section, we introduce a mechanism to learn/adjust/adapt the 
policies at runtime, enabling knowledge evolution (i.e., KE 
part in FOL4KE) that we promised earlier. As the controller 
has to take an action in each control loop, it should try to select 
those actions taken in the past which produced good rewards. 
Here by reward we mean long-term cumulative” reward: 








Co 
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where 7 is the discount rate determining the relative impor- 
tance of future rewards, in the same way that promising money 
for some time in the future worth less than the same money 
for literally now in time. If the agent only takes actions based 
on the actions that have been already tried, it will stick to a 
suboptimal knowledge Therefore, there exists a trade-off (cf. 
step 2 in Algorithm [I) between the actions that have already 
tried (known as exploitation) and new actions that may lead 
to better rewards in the future (known as exploration). 

In each control loop, the controller needs to take an action 
based on a function of the state in which it is located and the 
action that it selects. Q(s, a) denotes this Q function based on 
which the controller takes actions determining the expected 
cumulative reward that can be received by taking action a in 
state s. This value directly depends on the policy followed by 
the controller, thus determining the behavior of the controller. 
This policy 7(s,a) is the probability of taking action a from 
state s. As a result, the value of taking action a in state s 


following the policy z is formally defined as: 


Q"(8,a) = Ex{)) Y*reseti}s (3) 


k=0 


where E,{.} is the expectation function under policy 7. When 
an appropriate policy is found, the RL problem at hand is 
solved. Q-learning is a technique that does not require any 
specific policy in order to evaluate Q(s, a): 


Q(st, at) — Q( se, ae) +0 [rt41+7 ima Q(se41, a)—Q(se, ax)], 

(4) 
where 7¥ is the learning rate. Therefore, the optimal Q function 
can be approximated without any specific policy to follow. In 
this case, the policy adaptation can be achieved by selecting a 
random action with probability € and an action that maximizes 
the Q function in the current state with probability 1—, note 
that the value of € is determined by the exploitation/exploration 


strategy (cf. [V-A): 
a(s) =arg max Q(s, k) (5) 


Fuzzy logic version of Q-learning optimizes the con- 
sequents of the rules in fuzzy controllers. Fuzzy Q-learning 
(FQL) has some critical benefits over its traditional algorithm. 
First and most importantly, for some application areas in 
which the number of states and the potential actions that the 
agent can take in those states are high; hence the q-values 
need to be stored in large look up tables. As a result, the 
Q-learning becomes impractical in continuous state spaces 
[24] such as our case in this paper. By employing fuzzy 
variables, continuous state spaces can be discretized into states 
represented by all the combinations of variables (cf. Figure (4). 

The fuzzy Q-learning algorithm that we have implemented 
is summarized in Algorithm |1} In the case of our running 
example, the state space is finite (i.e., 9 states as the full 
combination of 3 x 3 membership functions for fuzzy variables 
w and rt) and RobusT2Scal1e has to choose a scaling action 
among 5 possible actions {—2, —1,0,+1, +2}. However, the 
design methodology that we demonstrated in this section is 
general and can be applied for any possible state and action 
spaces. Note that the convergence is detected when the change 
in the consequent functions is negligible in each learning loop. 


D. FQLAKE for Dynamic Resource Allocation 


The combination of the fuzzy controller and the fuzzy Q- 
learning algorithm is illustrated in Figure 

Reward function. As illustrated in Figure |3} the controller 
receives the current values of w and rt that correspond to 
the state of the system, s(t) (cf. Step 4 in Algorithm (1). The 
control signal sa represents the action a that the controller 
needs to take at each state. We define the reward signal r(t) 
based on three criteria: (i) SLO violations, (ii) the amount of 
resource acquired, which directly determine the cost, and (iii) 
throughput, as follows: 


r(t) =U(t) -U— 1), (6) 


Algorithm 1 : Fuzzy Q-Learning 





Require: y,7 

1: Initialize q-values: 
git,j)=0, 1<i<N,1<j<J 

2: Select an action for each fired rule: 
a; = argmax,q{i, k] with probability 1—e > Eq. 
a; = random{ax,k = 1,2,--- , J} with probability € 

3: Calculate the control action by the fuzzy controller: 
a=, [ui(L) X ai, > Eq. [1] 


where a;(s) is the firing level of the rule 7 


4: Approximate the Q_ function from the current 

q-values and the firing level of the rules: 
N ; 

Q(s(t), a) = dai u(s) x alt, ai], 


where Q(s(t),a) is the value of the Q function for 
the state current state s(¢) in iteration ¢ and the action a 
5: Take action a and let system goes to the next state s(t+1). 
6: Observe the reinforcement signal, r(t + 1) 
and compute the value for the new _ state: 
V(s(t +.1)) = OM, ai(s(¢ + 1))-maze (q[é, an). 
7: Calculate the error signal: 
AQ =r(t+1)+7x Vi(s(t + 1)) — Q(s(t), a), > Eq. f4] 
where y is a discount factor 
8: Update q-values: 
gli, ai] = q[t, ai] +7 - AQ - a4(s(t)), > Eq. [4] 
where 77 is a learning rate 
9: Repeat the process for the new state until it converges 





where U(t) is the utility value of the system at time t. Hence, 
if a controlling action leads to an increased utility, it means 
that the action is appropriate. Otherwise, if the reward is close 
to zero, it implies that the action is not effective. A negative 
reward (punishment) warns that the situation is worse after 
taking the action. The utility function that we defined is as 
below: 








thit um(t 
Ut) = OO gy 1-2) 4-5-1 HD) 
@t—racs)  ptacs <rt(t) < 2+ Thaes 
A(t)=<¢1 rt(t) > 2-rtaes 
0 rt(t) < rtdes 


where th(t), vm(t) and rt(t) are throughput, number of 
worker roles (VMs) and response time of the system, re- 
spectively. w1,w2 and wg are their corresponding weights 
determining their relative importance in the utility function. 
Note that in order to aggregate the individual criteria together, 
we normalized them depending on whether they should be 
maximized or minimized. It is important to know that the only 
input which is required by the users to determine the system 
goals is the value for these three weights, i.e., w1,w2 and w3 
(cf. Figure Bp. 

Knowledge base update. FQLA4KE starts with controlling 
the allocation of resources with no a priori knowledge. Af- 
ter enough explorations, the consequents of the rules can 
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Fig. 5: FQL4KE realization 





be determined by selecting those actions that correspond to 
the highest q-value in each row of the Q table. Although 
FOQL4KE does not rely on design-time knowledge, if even 
partial knowledge is available (i.e., operator of the system 
is confident with providing some of the elasticity policies) 
or there exists data regarding performance of the application, 
FOL4KE can exploit such knowledge by initializing q-values 
(cf. step | in Algorithm [I) with more meaningful data instead 
of initializing them with zero. This implies a quicker learning 
convergence. 








IV. REALIZATION 


To demonstrate the applicability of our approach, we real- 
ized a prototype of FOL4KE as a generic, configurable and 
customizable platform on Microsoft Azure. As illustrated in 


Figure [5] this prototype comprises of 3 integrated components: 





i A learning component FQL implemented in Matlab 

ii A cloud controller reasoning engine (RobusT2Scale) 
implemented in Matlab 

iii A cloud-based application framework (Elast icBench) 
implemented with Microsoft .NET technologies (.NET 
framework 4 and Azure SDK 2.5) 

iv The integration between these three components by soft- 
ware connectors developed with .NET technologies. 


The rule base of FOL4KE is continuously updated by FOL 
component and new/updated rules are fed into the cloud 
controller RobusT2Scale as an intermediate text-based * fis 
file. ELasticBench is a new component we developed with 
Microsoft development technologies and sits on top of the 
cloud platform. ElasticBench is in charge of monitoring 
the application as well as deploying virtual machines. 














A. ElasticBench 


ElasticBench includes a workload generator in order to 
simulate different patterns. This allows to test and train the 
controller before actual execution. It also provides all the 


‘code is available at https://github.com/pooyanjamshidi/Fuzzy-Q-Learning 
?code is available at https://github.com/pooyanjamshidi/RobusT2Scale 
3code is available at https://github.com/pooyanjamshidi/ElasticBench 


required functionalities to perform a variety of auto-scaling 
experiments. In order to build a generic workload generator, 
we developed a service to generate Fibonacci numbers. A 
delay is embedded in the process of calculating Fibonacci 
numbers to simulate a process that takes a reasonably long 
period. Note that calculating Fibonacci numbers is an O(V) 
task, which makes it a good candidate for demonstrating 
different application types by embedding different delays, 
since our platform can generate requests with varying patterns. 
As depicted in Figure [6] a positive integer (n € N) is written 
to the queue and on the other end of the queue, a Fibonacci 
series should be calculated, and this takes different times to 
be calculated depending on the number of processor at the 
other end of the queue. This enables us to test our solution 
with a generic platform that has several functionalities, each 
of which takes different times depending on the available 
resources providing different level of processing capabilities. 
This resemble a real-world software system that expose dif- 
ferent functionalities with different response time depending 
on the resource contention. For instance, some functionalities 
are CPU intensive and depending on the number of available 
CPU reply with a different response time, while others may 
be memory intensive and depending on the available memory 
takes different time to react. 

Two types of Azure services are used to implement Elas- 
ticBench: web role and worker role. Note that web and worker 
roles corresponds to VM at infrastructure level. The requests 
issued from load generator are received by the web role, 
which puts a message on a task assignment queue as shown 
in Figure (6| The worker role instances continuously checks 
this queue and after locating a message, a background process 
(to calculate Fibonacci number) is immediately started based 
on the content of the message in the queue. The worker roles 
communicate with the storage to acquire the data required for 
processing (e.g., previously calculated Fibonacci numbers). 

We implemented two types of worker role: some worker 
roles (P) process the messages (1.e., calculating the Fibonacci 
numbers), whereas the other type of worker role (M) imple- 
ments the MAPE-K feedback control loop. The main function- 
alities in M worker role is as follows: (1) It reads performance 
metrics from the blackboard storage; (2) It calculates metrics 
for feeding the fuzzy controller; (3) It also implements a policy 
enforcer to check whether the number of nodes to be enacted is 
within the predefined range and whether the worker role is in a 
stable mode. (4) It is possible to plug-in other cloud controllers 
(i.e., controllers implementing other techniques) with few lines 
of code; (5) It also implements mechanisms comprising the 
resiliency of this worker role. 

The design decision we made for implementing the MAPE- 
K functionalities inside a worker role in the cloud was strategic 
for the experiments that we needed to run. In one hand, in 
order to avoid network latencies for decision enaction, we 
required an internal and isolated network between the decision 
maker module (i.e., WM) and the scaling roles (i.e., P). On 
the other hand, we needed to provide a close design to the 
real world setting as it is the case for commercial solutions 
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Fig. 6: Overview of our experimental setting. 


in public clouds that the auto-scaling controller sits near the 
scaling layer as opposed to be deployed on premise. 

In summary, ElasticBench follows these steps in each exper- 
iment: (1) workload generator LG generates a set of requests 
according to a predefined pattern. (2) The listener service 
L receives the requests and push them as messages to task 
assignment queue. (3,4,5) Worker roles P picks the messages 
from the queue and process them. If the results are stored in 
the cache beforehand, they only read them from the cache. (6) 
Low-level monitoring performance metrics are stored in the 
blackboard storage. (7) Worker role M retrieves the counters 
and calculates the metrics. (8) M then feed them to the cloud 
controller and a decision (i.e., change in the number of P 
worker roles) is taken. (9) If the decision has been endorsed 
by the policy enforcer|*]then (10) it is enacted by the actuator 
to the underlying platform. Note that the actuator calls the 
appropriate RESTFull operations of the Azure platform in 
order to change the configuration of the P worker roles and 
to enact the changes accordingly (11) the worker role M 
periodically writes the results to a table storage, (12) which 
can later be used for further analysis. 


B. Online Knowledge Learning 


Online learning is a mechanism for enabling knowledge 
evolution at runtime [4]. As shown in Figure [7] online knowl- 
edge learning operates on top of the autonomic controller. The 
realization of the learning mechanism is divided in the follow- 
ing phases: (i) monitored operation, (ii) learning, (iii) normal 
operation. Each phase corresponds to an execution mode. The 
learning process is executed and the system enters to the 
monitored operation mode. In this mode, statistics for analysis 
is periodically collected. After completion, control is returned 
to the learning process which enters the learning mode. In this 
phase, depending on the collected analytics, it may update the 
knowledge base. This completes the knowledge learning loop 
(cf. loop at the top layer in Figure [7). 


V. EXPERIMENTAL EVALUATION 


We demonstrate the efficiency and effectiveness of FOL4KE 
via an experimental evaluation. More specifically, the key pur- 
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Fig. 7: Augmenting MAPE-K with online learning. 


pose of the experiments is to answer the following questions: 
RQI1. Is FOL4KE able to learn how to efficiently acquire 
resources for dynamic systems in cloud environment? 
RQ2. Is FOL4KE flexible enough to allow the operator to 
set different strategies? and how the approach is effective in 
terms of key elasticity criteria (cf. criteria column in Table (Ip? 





A. Experimental Setting 


The environment (i.e., cloud platform) in this setting is 
unique in some aspects. The main differentiating aspects is the 
delay in receiving rewards after each scaling action has been 
taken. The agent (i.e., cloud controller) deployed in a delayed- 
feedback environment (i.e., cloud) comes to know the reward 
after a non-negative integer indicating the number of time- 
steps between an agent taking an scaling action and actually 
receiving its feedback (the state observation and reward). 
In each monitoring cycle, which happens every 10 seconds, 
the controller knows about its state but in order to receive 
the reinforcement signal, it has to wait for example for 8-9 
minutes for scaling out” actions and 2-3 minutes for ’scaling 
in” actions to be enacted. Such kinds of delayed feedback 
environments introduce some challenges for learning con- 
vergence. We tackled this by investigating different learning 
strategies. As depicted in Figure [8| we considered 5 different 
exploitation/exploration strategies (i.e., S1—.S'5). For instance, 
in S1, the learning process starts by a high exploration rate, 
ie, € = 1 (cf. Step 2 in Algorithm [ip. We set this in order to 
explore all possible actions enough times in early cycles. Once 
the optimal fuzzy rules are learned, the controller with updated 
elasticity policies will replace the current one. However, at 
this stage, due to the dynamics of the workload, we cannot 
set € = O because a change introduced to the workload, the 
learning process needs to be performed. As a result, after 
first set of rules has been learned, we set « = 0.2 in order 
to maintain a balance between exploration and exploitation. 
In summary, FQL starts with exploration phase and after a 
first learning convergence happened, it enters the balanced 





Fig. 8: Exploitation/exploration strategies. 


exploration-exploitation phase. However, in order to compare 
the performance of FOL4KE under different strategies, we 
consider other learning strategies as well. For instance, in $2, 
after initial learning by high exploration, we set « = 0 in order 
to fully exploit the learned knowledge. 

The learning rate in the experiments are set to a constant 
value 7 = 0.1 and the discount factor is set to y = 0.8. The 
minimum and maximum number of nodes that are allowed 
to be enacted is set to 1 and 7 respectively. We set the 
control interval to 10sec. The worker role that our FOL4KE is 
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Fig. 9: Temporal evolution of control surface. 
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deployed is small VM with 1 core and 1792MB memory 
while the P worker roles (cf. Figure [6) are extra small 
VMs with 1 core and 768M B memory. Initially, we set all 
cells in Q table to zero, assuming no a priori knowledge. 
We set the weights in the reward function all equal, ice., 
Wy, = Wo = w3 = 1 (cf. Eq. [7)p. The experiment time has been 
set to 24hours in order to monitor the performance of the 
system in enough learning steps (on average due to the delay 
in reward observation, each step takes between 2 — 9mins). 


B. FQLAKE Efficiency (RQ1) 


The temporary evolution of the q-values associated to each 
state-action pairs for the learning strategy S1 is shown (for 
partial set of pairs) in Figure Note that the change in the 
q-values occurs when the corresponding rule is activated, i.e., 
when the system is in state S(t) and takes specific action aj. 
As the figure shows, some q-values changed to a negative 
value during exploration phase. It means that these actions 
are basically punished and as a result are not appropriate to 
be taken in the future. The optimal consequent for each rule 
in the rule base is determined by the most highest q-value at 
the end of the learning phase. For instance, action as is the 
best consequent for rule number 9 in learning strategy $1. 

In accordance to the change in the q-values, the control 
surface of the fuzzy controller is also evolving. Figure 9 
shows the temporal evolution in control surface of the fuzzy 
controller. The initial design-time surface is not shown as it is 
a constant plane at point zero. The surface is evolved until the 
learning has been converged. Note that the first surface is the 
one in the upper left, then upper right, lower left and the final 
surface is located at the lower right corner when the learning 
has been converged. 

The runtime overhead of the feedback control loop activities 
(cf. Figure [5) is depicted in Figure[I1] We collected these data 
points in each control loop (i.e., more than 8600 data points). 
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Fig. 10: Temporal evolution of q-values. 


As the result shows, the learning overhead is in the order of 
100ms and the monitoring and actuation delay is in the order 
of 1000ms. Note that the actuation delay is only measured 
for issuing the change command and it does not include the 
enaction time as it is in the order of several minutes. 


C. FQIAKE Flexibility and Effectiveness (RQ2) 


Let us now investigate how effective the self-learning 
mechanism is in FQL4KE. More specifically, we want to 
study how the learning component of FOL4KE improves 
the functionality of dynamic resource allocation over static 
rule-based or native mechanisms. Table [I| summarizes the 
criteria that we considered for comparing different auto- 
scaling strategies with respect to different workload patterns. 
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Fig. 11: Runtime delay for MAPE loop activities. 


Note that strategies S5 corresponds to the fuzzy controller 
with initial knowledge extracted from users at design-time 
with no learning component and the last strategy corresponds 
to Azure native auto-scaling. We synthetically generated 6 
different workload patterns (see Figure [12) in order to provide 
enough environmental conditions for this comparison. The x 
axis shows the experimental time and the y axis shows the 
number (in [0,100]) for which the Fibonacci series needs to 
be calculated, demonstrating the workload intensity similar to 
the number of concurrent users for a web-based application. A 
key parameter in learning-based approaches is the convergence 
delay to reach the optimal policy. The response time of the 
system under different workloads is also considered as another 
comparison criterion. The average number of VMs acquired 
throughout the experiment interval as well as the number of 
changes in the underlying resources (i.e., sum of issued scaling 
actions) is also considered as a comparison criterion. The main 
findings described in Table [I] can be summarized as follows: 


e Sequential decreasing of exploration factor (cf. S11) is 
effective in accelerating learning convergence. However, 
it is also effective for highly dynamic workloads such 
as “quickly varying” as in Figure because it keeps 
a minimum of € = 0.2 when initial knowledge has been 
learned, it then helps to keep the rules updated when new 
situations arise. 

e Initial high exploration (cf. S2) is effective for quick 
convergence, but in non-predictable workloads such as 
*quickly varying”, the decisions become sub-optimal. 
This is evident by comparing the average number of VMs 
and the number of learning iteration until convergence for 
*large variation” and "quickly varying” patterns. 

e Although high constant exploration (cf. 53) is effective 
in unpredictable environment (see response time and 
compare it with other strategies), it is not optimal in 
terms of convergence, number of changes and acquired 
resources. Note that the higher number of changes in 
the resources means that for quite considerable period 
in time there exists some instability in the deployment 
environment of the application. 

e Maximum exploration rate (cf. S4) is not a good learning 
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Fig. 12: Synthetic workload patterns. 


strategy by no means as it only produces random actions 
and it never converges to an optimal policy. 

e The strategy S5 is RobusT2Scale cloud controller, 
representing a policy-based adaptation without any policy 
adaptation process. By comparing response time, number 
of changes and average number of resources (almost in 
all aspects and for all patterns it is relatively lower), we 
can observe that FOL4KE is effective in terms of learning 
optimal policies and updating them at runtime. 

e Both the cloud controller without learning mechanism 
and with learning are more effective than the native cloud 
platform reactive auto-scalers. Note that for the controller 
without learning, we consider a reasonably logical set 
of rules to govern the elasticity decision making. But if 
we consider a non sensible set of rules, the native auto- 
scaling of Azure would beat RobusT2Scale. 





VI. DISCUSSION 
A. Computational Complexity and Memory Consumption 


Step 2 to Step 8 in Algorithm |1] are the main computa- 
tionally intensive steps of our approach and based on our 
experiments are in the order of few minutes for 9 states and 
for 10,000 learning epochs. However, this is not an issue in 
our setting because the control loops are long enough due 
to the fact that scaling out actions are also in the order of 
magnitude of several minutes, 8-9 minutes for scaling out 
an extra small VM on Azure platform and 2-3 minutes for 
removing an existing VM. 

In addition, the memory consumption of our approach is 
given by the dimensions of the look up table that saves and 
updates the q-values. In other words, the space complexity of 
our approach is always O(N x J), where N is the number of 
states and A is the number of actions. In the setting that we 
described in this paper, the table is composed by 9 states x5 
actions = 45 q-values, thus memory consumption is negligible. 


B. FQLAKE for Policy-based Adaptations 


Although in this paper we showed applicability of FOL4KE 
with RobusT2Scale, this approach is general and can be in- 
tegrated with any knowledge-based controllers. By knowledge- 
based controller, we mean any controller that have explicit 
notion of knowledge that can be specified in terms of rules and 
used for reasoning and producing the control signal. Basically 





TABLE I: Comparison of the effectiveness of exploration/ex- 
ploitation strategies under different workloads. 


in Table |I/ may be slightly different depending on the utility 
function defined in Eq. |7| We defined a reasonable function 
O measure the reward, while this can be defined differently 





leading to a different effectiveness of learning strategies. We 
expect the results would be consistent with the effectiveness 
(cf. Table |I) of our solution as long as the function is 
appropriate, i.e., only consider both reward or punishment even 
with different metrics that we used, but not only one aspect. 

The other threat to the validity of the result is the ap- 
plication framework that we built for our experiment, i.e., 
ElasticBench. Although we embed different characteris- 
tics of a cloud-based application by using Fibonacci based cal- 
culation and using cloud based technologies such as caching, 
but the results presented in Table[I]may be slightly different for 
another type of application. However, since we can simulate 
different functionalities with this application framework, we 
expect that results on a different application is consistent 








Strategy Criteria Big spike Dual phase Large variations 
rto5%,um 1212ms, 2.2 548ms, 3.6 991ms, 4.3 
S1 node change 390 360 420 
convergence 32 34 40 
rto5%, 0M 1298ms, 2.3 609ms, 3.8 1191ms, 4.4 
$2 node change 412 376 429 
convergence 38 36 87 
rtg5%, 0m 1262ms, 2.4 701ms, 3.8 1203ms, 4.3 
S3 node change 420 387 432 
convergence 30 29 68 
rtg5%,um 1193ms, 3.2 723ms, 4.1 1594ms, 4.8 
S4 node change 487 421 453 
convergence 328 328 328 
rtgsy%, 0M 1339ms, 3.2 729ms, 3.8 1233ms, 5.1 
S5 node change 410 377 420 
convergence N/A N/A N/A 
rto5%, 0m 1409ms, 3.3 712ms, 4.0 1341ms, 5.5 
Azure node change 330 299 367 
convergence N/A N/A N/A 
Quickly varying Slowly varying Steep tri phase 
rtg5%, 0m 1319ms, 4.4 512ms, 3.6 561ms, 3.4 
S1 node change 432 355 375 
convergence 65 24 27 
rtg5%,u0m 1350ms, 4.8 533ms, 3.6 603ms, 3.4 
S2 node change 486 370 393 
convergence 98 45 28 
rtgsy%, 0M 1287ms, 4.9 507ms, 3.7 569ms, 3.4 
S3 node change 512 372 412 
convergence 86 40 23 
rtg5%, 0m 2098ms, 5.9 572ms, 5.0 722ms, 4.8 
S4 node change 542 411 444 
convergence 328 328 328 
rto5%,um 1341ms, 5.3 567ms, 3.7 512ms, 3.9 
S5 node change 479 366 390 
convergence N/A N/A N/A 
rtosy%, 0M 1431ms, 5.4 1101ms, 3.7 1412ms, 4.0 
Azure node change 398 287 231 
convergence N/A N/A N/A 





FQL4KE can be integrated with such controllers to learn rules 
and populate the knowledge base at runtime. Such policy- 
based controllers are not only applied for resource scaling but 
have also been previously applied to the rule-based adaptations 
of software architecture at runtime [22], [23]. 


C. Limitations of FQLAKE 


Besides the provided features of FOL4KE, it comes with 
some limitations. Firstly, performance of scaling actions pro- 
duced by FOL4KE during initial learning epochs at runtime 
may be poor. This imposes some difficulties. First, at early 
stages when the learning process has not been converged 
there might be some over-provisioning or under-provisioning 
due to such decisions. However, some other strategies (e.g., 
temporary over-provisioning) can be adopted in parallel in 
order to let the approach learns policies and after it learned 
optimal policies, it becomes the sole decision maker for 
resource allocation. Secondly, the learning process may be 
sensitive to the selection of the reinforcement signal (cf. 
Equation [7). It is also dependent on the fact that the system 
states must have been visited sufficiently [43]. 








D. Threats to Validity 


There are a number of sources of threats to the validity of 
the results presented in Section [V| First, the results presented 


with the ones presented in Section This requires further 
investigations with real-world software applications. 

Although the approach does not impose any constraints on 
the possible number of scaling actions, for simplicity we only 
consider five possible scaling actions (i.e., —2, —1,0, +2, +2) 
for describing the approach and evaluations. This limited set of 
actions has some implications on the performance (cf. Section 
\V-B) and effectiveness of learning (cf. Section [V-C). 

Finally, limited number of workload patterns (6 patterns is 
used in this work for evaluation, cf. Figure is another 
threats to the validity. As it is also used in other research [19], 
this set of patterns, although not comprehensive, but provides 
a reasonably enough environmental conditions for evaluation. 


VII. RELATED WORK 


In autonomic computing [29], exploiting policy-based adap- 
tation techniques to build self-adaptive software has been 
attractive. In the following, instead of reviewing auto-scaling 
approaches which has been comprehensively reviewed from 
different aspects in [33]11)(16)(18)[6], we only consider 
related work to fundamental attributes of autonomic computing 
(i.e., self-adaptiveness, self-learning, self-optimizing). We then 
single out and categorize the selected works whose focus is 
on policy-based adaptation, whether it is related to software 
adaptation or dynamic resource allocation. 

Policy-based adaptation. In self-adaptive software literature, 
policy-based adaptation has gained momentum due to its 
efficiency and flexibility for adaptation planning [25]. Policy, 
in general, is a mapping between a situation or condition, 
to appropriate action, strategy or adaptation. A policy-based 
approach can potentially decouple adaptation logic with how 
to react when necessary. Rainbow exploits architecture- 
based adaptation, in which system chooses new architectural 
reconfiguration, at runtime, based on rules defined at design- 
time. In a similar line, Sykes et al. propose an online plan- 
ning approach to architecture-based self-managed systems. 
Their work describes plan, as a set of condition-action rules, 
which has been generated by observing a change in the oper- 
ational environment or a system failure. Georgas and Taylor 


present a architecture-centric knowledge-based approach 
in which adaptation polices are specified as reaction rules. Not 
all of the policy-based approaches exploit if-then rules, other 
resemblances of policy have been also utilized. For instance, 
model-based approaches in terms of variability models has 
been adopted in [13]. While policy-based approaches have 
been shown useful in some settings (e.g., enforcing certain 
characteristics in the system), they cannot deal with unseen 
situations or uncertainties. System hence produces suboptimal 
decision, as there is no automatic mechanism to react when 
exceptions occur and usually need human intervention [25]. To 
address the issues, online policy evolution has been introduced 
(30), [4]. The solution proposed in this paper, FOL4KE, is in 
the same line of research, but applied fuzzy Q-learning, for 
the first time, to the problem of dynamic resource allocation. 
Dynamic Adaptation Planning. Qian et al. exploits 
case-based reasoning to improve the effectiveness of adapta- 
tion planning by learning from past experiences. Goal models 
are used to represent system requirements of self-adaptive 
systems, while the adaptation is essentially a search to find 
a match between new situations and the closest prior cases of 
adaptation. In [8], dynamic decision networks are proposed to 
deal with the uncertainty in decision-making of self-adaptive 
systems. The initial models are provided by experts; however, 
the models are updated at runtime as more evidences are 
observed through monitoring. Esfahani et al. discuss the 
application of black-box learning models to understand the 
impact of different features in a self-adaptive system. Given a 
system goal, a function is learned to formulate the impact of 
different features, and accordingly the features are enabled or 
disabled to adapt to the changes and achieve the goals. Amoui 
et al. present an approach based on reinforcement learning 
to select adaptation actions at runtime. Through an adaptive 
web-based case study, it is shown that the approach provides 
similar results comparing to a voting-based approach that uses 
expert knowledge. Kim et al. discuss the application of 
Q-learning to plan architecture-based adaptations, a similar 
policy-based architecture adaptation is also proposed in [22]. 
These approaches are applied in robotics domain. Similarly 
to these works, FOL4KE is proposed to address the issue of 
decision making in autonomic systems; however, it particularly 
focuses on resource allocation in cloud-based applications. 
Dynamic Resource Allocation. Xu et al. [48], present 
an approach to learning appropriate auto-configuration in 
virtualized resources. It uses multiple agents, each of which 
apply reinforcement learning to optimize auto-configuration 
of its dedicated environment. Barrett et al. investigate 
the impact of varying performance of cloud resources on 
application performance. They show that a resource alloca- 
tion approach, considering this aspect, achieves benefits in 
terms of performance and cost. To reduce the learning time, 
a parallelized reinforcement learning algorithm is proposed 
through which multiple agents are employed to deal with 
the same tasks to speed up the procedure to explore the 
state space. The reward values are calculated by combining 
the accumulated experience of different agents. In a similar 








approach appropriate initialization of the q-values is 
proposed to accelerate the learning convergence. Tesauro et 
al. demonstrate how to combine the strengths of both 
RL (model-free) and queuing models (model-based) in a 
hybrid approach, in which their RL needs to be trained at 
design-time while at runtime a queuing model policy controls 
the system. In [40], a multi-layer approach is presented to 
handle multi-objective requirements such as performance and 
power in dynamic resource allocation. The lower layer focuses 
on each objective, and exploits a fuzzy controller proposed 
earlier in [49]. The higher layer is to maintain a trade-off by 
coordinating the controllers. Lama et al. integrate NN 
with fuzzy logic to build adaptive controllers for autonomic 
server provisioning. Similar to our approach, NNs define a 
set of fuzzy rules, and the self-adaptive controller adapts 
the structure of the NN at runtime, therefore automatically 
updating rules. Unlike the above approaches, FOL4KE offers 
a seamless knowledge evolution through fuzzy control and RL, 
putting aside the burden that was on the shoulder of users. 





VIII. CONCLUSION AND FUTURE WORK 


This paper has investigated the notion of knowledge evolu- 
tion in dynamic resource provisioning for cloud-based applica- 
tions. The scenario under study assumes no a priori knowledge 
is available regarding elasticity policies that cloud controllers 
can exploit. More precisely, instead of specifying elasticity 
policies as a typical case in auto-scaling solutions, system 
operators are now only required to provide the importance 
weights in reward functions. In order to realize this, a fuzzy 
rule-based controller (the lower feedback control loop in 
Figure |7) entangled with a reinforcement learning algorithm 
(the upper knowledge evolution loop in Figure [7) for learning 
optimal elasticity policies, has been proposed. The main 
advantages of the proposed approach are as follows: 





1) FOLA4KE is robust to highly dynamic workload intensity 
due to its self-adaptive and self-learning capabilities. 

2) FQLAKE is model-independent. The variations in the 
performance of the deployed applications and the un- 
predictability of dynamic workloads do not affect the 
effectiveness of the proposed approach. 

3) FQLA4KE is capable of automatically constructing the con- 
trol rules and keeping control parameters updated through 
fast online learning. It executes resource allocation and 
learns to improve its performance simultaneously. 

4) Unlike supervised techniques that learn from the training 
data, FOLAKE does not require off-line training that saves 
significant amount of time and efforts. 

















We plan to extend our approach in a number of ways: 
(i) extending FOL4KE to perform in the environments where 
only partially observable (for this we will exploit partially 
observable Markov decision processes), (ii) exploiting clus- 
tering approaches to learn the membership functions of the 
antecedents (in this work we assume they will not change 
once they specified, for enabling the dynamic change we will 
consider incremental clustering approaches) in fuzzy rules. 
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